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ABSTRACT 



A document or sentence processing apparatus having an 
input unit for inputting characters, a display unit for dis- 
playing input characters and a processing imit for converting 
and editing the input characters, in which the processing unit 
has a candidate word extraction unit which extracts candi- 
dates for the words with their characters omitted and/or 
omitted words themselves by referring to the vocabulary 
dictionary storing words and their usage frequency, to the 
dictionary of transition between words defining the infor- 
mation on the transition between words and the probability 
of the travjsition between words, and by searching the 
characters before and after the elUptic character included in 
the input sentence into the vocabulary dictionary, and a 
determination unit which selects a single word among the 
extracted candidate words by referring to the dictionary of 
transition between words. 
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SENTENCE PROCESSING APPARATUS AND 
METHOD THEREOF,UTIUZING 
DICTIONARIES TO INTERPOLATE 
ELLIPTIC CHARACTERS OR SYMBOLS 

BACKGROUND OF THE INVENTION 

The present invention relates to an apparatus for allowing 
a user to input long words in a sentence in terms of elliptic 
characters without disturbing the continuity of thought. The 
apparatus according to the present invention is beneficial for 
increasing the speed and operability of inputting characters 
by way of a keyboard. It is also applicable for effecting ao 
increase in the input speed when using handwritten character 
recognition or speech recognition and contributes to the 
increase in operability of the equipment. 

When inputting sentences using a word processor, it is 
often experienced that words related to private affairs, such 
as a job and a hobby and to a person's own name are 
repeatedly input. Especially in a case where those often-used 
character strings are long, it is a burden for the user to input 
repeatedly identical, long character strings. 

When using an apparatus which allows the user to input 
words by handwriting with a pen and tablet, since false 
recognition of characters input by the user may occur, the 
user has an increased burden in a case in which he or she 
inputs those characters and long sentences repeatedly. 

There is an apparatus that allows the user to input char- 
acters or sentences portions omitted partially in order to 
reduce the user's burden. 

For example, in Japanese Patent Application Laid-Open 
Number 7-191986 (1995) a technology which is disclosed 
which predicts an intended word and interpolates omitted 
characters by referring to memories storing syntax coding 
rules and word usage examples, when the iiser inputs a 
sentence including words with omitted characters. On the 
other hand, in Japanese Patent Application Laid-Open Num- 
ber 5-28180 (1993) a technology is disclosed which prepares 
a table storing combinations of adjacent words, such as noun 
class — ^verb class and verb class — ^verbal phrase, and inter- 
polates omitted characters and predicts an intended word by 
using this table. 

As shown in the conventional technologies described 
above, word-to-word relation information between adjacent 
words is required to interpolate a sentence including omitted 
characters. For example, m syntax coding rules and word 
usage examples are used as this information in Japanese 
Patent Application Laid-Open Number 7-191986 (1995), 
and combinations of adjacent words are used as this infor- 
mation in Japanese Patent Application Laid-Open Number 
5-28180 (1993). 

It is, however, necessary to prepare such word-to-word 
relation information by referring to a vast amount of refer- 
ence sentences, and it is not easy to prepare this information 
only by manual work. 

The conventional technologies described above assume 
that a single word or character in a sentence is omitted, and 
does not mention the case that a sentence with plural words 
and/or characters omitted is interpolated. 

SUMMARY OF THE INVENTION 

An object of the present invention is to provide an 
apparatus foi interpolating a sentence in which plural words 
and/or characters are omitted. 

Another object of the present invention is to provide an 
apparatus for extracting word-to-word relation information 
automatically and for preparing a dictionary. 
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The above object can be attained by a document or 
sentence processing apparatus having an input unit for 
inputting characters, a display unit for displaying input 
characters and a processing unit for converting and editing 

5 the input characters, in which the processing imit includes a 
candidate word extraction means which extracts candidates 
for the words with their characters omitted and/or omitted 
words themselves by referring to a vocabulary dictionary 
storing words and their usage frequency, to a dictionary of 

10 the transition between words defining information on the 
transition between words and the probability of the transi- 
tion between words, and by searching the characters before 
and after the elliptic character included in the input sentence 
into an vocabulary dictionary, and a determination means 

15 which selects a single word among the extracted candidate 
words by referring to the dictionary of transition between 
words. 

The above object can be attained by steps including a step 
of decomposing the input sentence into single words and 

20 storing coordinated pairs of the individual word and its 
occurrence count, a step of searching the class of the particle 
for the individual word and storing the count of transition 
between words into the transition dictionary, a step of 
extracting candidates for the words with their characters 

25 omitted and/or omitted words themselves by focusing on the 
characters before and after an elliptic character included in 
the input sentence and searching the vocabulary dictionary, 
a step of selecting a single word among the extracted 
candidate words by referring to the dictionary of transition 

30 between words, and a step of modifying the oocunence 
count of the selected word and modifying the transition 
dictionary on the basis of the information on transition 
between words in case the selected word is found in the 
vocabulary dictionary. 

35 

BRIEF DESCRIPTION OF THE DRAWINGS 

By way of example and to make the description more 
clear, reference is made to the accompanying drawings in 
40 which: 

FIG. 1 is a process diagram shows an overall which 
procedure according to the present invention. 

FIG. 2 is a process diagram which shows an overall 
operation according to the present invention. 
^5 FIG. 3 is a flowchart of the operations for building the 
dictionary of the present invention. 

FIG. 4 is a table which shows rules for building the 
dictionary. 

FIG. 5 is a diagram which shows examples of the vocabu- 
lary dictionary and the transition dictionary. 

FIG. 6 shows examples for the vocabulary dictionary and 
the transition dictionary. 

FIG. 7 is a flowchart of the interpolation process for an 
55 elliptic sentence. 

FIG. 8 is a flowchart of the candidate word extraction 
process. 

FIG, 9 is a flowchart of the optimal candidate determi- 
nation process. 

FIG. 10 is a diagram which shows a scheme for the 
determination of the optimal candidate. 

FIG. 11 is a flowchart of the dictionary building process. 

FIG. 12 is a diagram which shows a scheme for biulding 
65 the vocabulary dictionary and the transition dictionary. 

FIG. 13 is a diagram which shows a scheme for building 
the vocabulary dictionary and the transition dictionary. 



10/30/2003, EAST Version: 1.4.1 



us 6,173,253 Bl 

3 4 

FIG. 14 is a table which shows rules for learning words input word or sentence and for displaying the completed 

for the dictionary. word or sentence, a dictionary generation process 160 for 

HG. 15 is a table which shows rules for learning words generating the dictionary used in the interpolation process 

for the dictionary ^ dictionary learning process 170 for updating the 

™^ . J. L t J. r s dictionary for increasing the operating performance. The 

FIG, 16 IS a diagram which shows a procedure of the j^terpolafion process 150 further comprises a candidate 

operations for makmg an mterpolated sentence. word extraction process 151 which extracts elKptic symbols 

FIG. 17 IS a flowchart of the operations for making an included in the word or sentence including omitted charac- 

interpolated sentence. ^jj^qj. y^rords and extracts the candidates of omitted 

FIG. 18 is a diagram which shows a scheme for selection words; an optimal candidate determination means 152 for 

of candidates. extracting an optimal word from the extracted candidate 

FIG. 19 is a diagram which shows a definition of elliptic words and completing the interpolated sentence; and a 

symbols. display control process 153 (FIG. 2) for controlling the 

FIG. 20 is a diagram which shows an application example display of the interpolated sentence, 

of the present invention. is The dictionary generation process 160 is composed of a 

FIG. 21 is a diagram which shows an application example morphological analysis process 161 for decomposing the 

of the present invention i^P^ sentence into individual words, a syntax analysis 

HG. 22 is a diagram which shows the vocabulary dictio- process ^2 for analyzing the structure of the sentence on the 

nary and the transition dictionary. ^^sis of the individually separated words, an occurrence 

• ui 1 ^ uuu -r 11 20 count process 163 for counting the occurrence of words on 

FIG. 23 is a block diagram which shows an overall i* r*u u i • i i • 

^ . f. the basis of the result of the morphological analysis process 

configuration of the present invention. ^--^ j * * t^Ar *u 

^ ^ 161 and a transition count process 164 for counting the count 

PREFERRED EMBODIMENTS OF THE of transition of words on the basis of the result of the syntax 

INVENTION analysis 162. 

- . . J. , , ^ . 1 • J • ti- ^- 25 . A vocabulary dictionary 131 for interpolating the sentence 

In this embodunent, what is explained is an elliptic interpolation proc4 150; a transition dictionary 132 

sentence interpolation and recogn.Uon apparatus which ^ J determining optimal words and rules 

extracts eUiptic characters ma case in which a sentence is ^^^^ ^^^^^^ ^ ^ ^ ^ 

mput with some of its characters and/or words omitted, and aratus 2306 ^HbD) 

which completes the sentence by interpolating the omitted ,„ , . ,, c n- .• 

words, and generates the dictionary for word interpolation ^ . FIG 2 shows overall operations of the e hptic sentence 

and learns words for the dictionary. mterpolation and recogmUon apparatas. In FIG. 2 a symbol 

^ ^ - „ . ^ . ... 1. wij is assigned to and shown below the mdividual Japanese 

According to the foUowing, an apparatus in this embodi- ^hlracter. TTiis symbol implies that this apparatus can be 

ment will be described with reference to the diawmgs. ^^^^^ sentences other than Japanese language, for 

FIG. 23 shows the overall configuration of an elliptic 35 example, English sentences. The suffix ii represents an 
sentence interpolation and reoogmtion apparaUis. The com- individual character and the suffix j represents an individual 
ponent 2301 is a CPU, which reads out programs stored in ^^^.^j ^ symbols w with an identical number ii 
A ROM 2303 for generating sentences and executes the represents a single word. Thus, for example, a unique 
programs. The component 2302 is a memory area m a RAM number j is assigned to each individual alphabetical char- 
for storing the sentences being processed. The component 40 ^ctcr in the English language, and a unique number ii is 
2304 is an input unit including a keyboard, tablet, mouse and assigned to an individual word. The symbol wij appearing in 
another data input unit connected to hand-held data storage ^^^^^ described below has an identical meaning con- 
apparatuses, such as a floppy disk drive. The component sistently 

2305isad^playuQit,suchasCRTd^^^ Now,*examples of operations of the apparatus will be 
display. What is stored m ROM 2303 IS a set of document 45 ^ ./ , . ^ ,1. . rl ^ 
generation programs, specifically provided in accordance described. At first, an elliptic sentence rx^<Dm-j mput 
with the present invention, for interpolating omitted char- from the input unit. A symbol r-j is defined as a character 
acters and/or words in an input sentence and for generating representing omitted characters and/or words. In the candi- 
and editing the vocabulary dictionary with learned words. In date word extraction process 151, words '"gg< - Bfl^j are 
addition, in case the input unit 2304 is formed as a hand- 50 extracted as candidates for rgg — j. Next, in the optimal 
written character recognition apparatus, there are programs candidate determination process 152, by referring to the 
for extracting and recognizing strokes for a handwriting vocabulary dictionary 131 describing the occurrence count 
recognition operation. The component 2306 is an external representing the occurrence of words and the dictionary of 
data storage apparatus composed of hard disk drives which transition between words describing the transition count 
contains a vocabulary dictionary used in the present inven- 55 between words which represent the occurrence of a transi- 
tion. The sentence interpolation process for interpolating tionbetween words, optimal words in view of the context of 
omitted characters and/or words in an elliptic sentence, the ^ sentence arc determined from among the candidate words 
dictionaries used in the sentence interpolation process and extracted in the candidate word extraction process 151. A 
the process far generating and editing the dictionaries with display operation of the sentence with its omitted characters 
learned words in the present invention will be described go and/or words interpolated in the above manner onto the 
thereinafter. display unit is controlled by the display control process 153. 

FIG. 1 shows an overall procedure for interpolating an In case the interpolated sentence is different from that 

elliptic sentence and for generating and editing the dictio- expected by the user, the user modifies the imdesired part of 

naries with learned words. FIG. 2 shows a schematic dia- the interpolated sentence. In case the user finds that the 

gram of operations carried out by this procedure. 65 interpolated sentence contains an undesired part, the user 

This procedure is composed of an interpolation process specifies the undesired word by an input operation through 

150 for interpolating omitted characters and/or words in an the input unit 2304. In accordance with this user's input 
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Operation, the dictionary learning process 170 is started and 
the candidate characters for the word are specified by the 
display control process 153. For the candidate characters, 
the words obtained by the candidate word extraction process 
151 are displayed. The user selects his or her desired word 
from among the words displayed as candidate words. 
According to the words selected by the user's input 
operation, the usage occurrence of the word in the vocabu- 
lary dictionary and the transition probability in the dictio- 
nary of transition between words are modified by the dic- 
tionary learning process 170. 

Next, the individual processes will be described. 

Diaionary Building 

At first, the dictionary building process 160 for building 
the vocabulary dictionary 131 and the dictionary 132 of 
transition between words used for interpolation of the omit- 
ted characters in the interpolation process 150 wiU be 
described. 

FIG. 3 is a flowchart showing an example of the process 
for building dictionaries. In this example, what is descnbed 
is a case wherein a user builds a dictionary using sentences 
defined by the user beforehand. At first, the sentence defined 
beforehand is read out from the input unit 2304, if it is stored 
in the external storage meditmi, or is read out firom the 
memory apparatus 2306, if it is stored in the memory 
apparatus 2306 (step 301). In this example, what is 
described is a case in which a sentence is 
^ X^^^ (Dm^iX — J in the Japanese language. Next, in 
the morphological analysis process 161, the read-out sen- 
tence is decomposed into individually separated words by 
morphological analysis based on the rules stored beforehand 
in the memory apparams 2306, and a sentence composed of 
delimited individual words, that is, a word based delimited 
sentence, is generated (step 302). In the morphological 
analysis process 161, the input sentence 
r ^ ^ 3 SSI 0) PI 9! tt J is transformed into a set of delim- 
ited individual words, rjtip/S5a /©/Klft/tij, and each 
part of speech in this sentence is interpreted as (noun) 
/jasa (Sa-series irregular conjugation noim) /o (case 
postpositional particle) /WAit (S-series irregular conjuga- 
tion noun) /a. (supplementary postpositional particle)]. 

Id the syntax analysis process 162, the sentence read out 
from the input unit 2304 or the memory apparatus 2306 is 
parsed according to the rules stored beforehand in the 
memory apparatus 2306, and a compound word based 
delimited sentence including compound words, such as a 
compound noun, is generated (step 303). As for the input 
sentence '"Xi^SSacogaifitt — j, f x^— saj is recognized 
as a compound noun including a noun-noun stnicmre, and a 
compound word based delimited sentence 

r * ^ S SI /©/Mft/lijis obtained. Next, in the occurrence 
count process 163 and the transition count process 164, by 
referring to the word based delimited sentence and the 
compound word based delimited sentence, the occurrence 
count of words and compound words is measured and stored 
into the vocabulary dictionary 131 (step 304). In addition, 
the transition counts between words and the transition 
counts between compound words are measured and stored 
into the dictionary 132 of transition between words (step 
305). A method of measuring the occurrence count and 
transition count of words will described later with reference 
FIG. 16. 

By building dictionaries by learning both fine-grained or 
word-by-word delimited sentence and a coarse-grained (on 
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the basis of a compound word) delimited sentence as 
described above, it will be appreciated that the dictionary 
can be used with higher flexibility when the user uses the 
dictionaries for interpolating an input sentence having omit- 
5 ted characteis and/or words. 

FIG. 4 shows examples of rules for generating a coarse- 
grained delimited sentence on the basis of a compotmd word 
for example. Methods for forming a coarse-grained delim- 
ited sentence structure on the basis of a compoimd word 
10 include the following methods. 

(1) A sentence is transformed into a coarse-grained delim- 
ited sentence structure in such a way that a sequential 
occurrence of nouns is interpreted as a compound noim 
j5 (for example, a sequence of ^X^j (noun) and 
c^aii (noun) is interpreted as c^^SUfiij (compound 
noun), and that a sequential occunrence of a Sa-series 
irregular conjugation libiin and '' ■f ij (Sa-series 
irregular conjugation verb) is interpreted as a com- 
20 pound verb (for example, a sequence of (Sa- 
series irregular conjugation noun) and ^t^j (Sa- 
series irregular conjugation verb) is interpreted as 
(Sa-series irregular conjugation verb)). 
25 (2) A sentence is transformed into a coarse-grained delim- 
ited sentence structure in such a way that a part 
delimited by a postpositional particle and an auxiliary 
verb is recognized as a single phrase. 
A single delimited unit may be defined in response to the 
30 user's preference. In a case where the sentence is described 
in a language other than the Japanese language and gram- 
matical rules other than those described above are appHed, 
the scheme described above can be adopted by properly 
modifying the rules described above. 
35 FIG. 5 shows structures of the vocabulary dictionary 11 
and the dictionary 132 of transition between words, both of 
which build by the method described with reference to FIG. 
3. The words (and their compound words) appearing in the 
sentence and their occunence counts, and total occurrence 
40 count of whole words are stored in the vocabulary dictionary 
131. Information on the transition between words and the 
transition count appearing actually in the sentence are stored 
in the dictionary 132 of transition between words. For 
easiness of access to the dictionaries, it is desired to arrange 
45 those data in the order of the character code. FIG. 5 shows 
a result of measuring the occurrence count of words (and 
compound words) and a transition count between words 
(and compound words) in the text string 
•"A^^lfllOggftj. Substantially, the present invention can 
50 be realized only by identifying those words, the occurrence 
count of words (occurrence frequency), information on 
transition between words and the transition count (transition 
probability). However, it is possible to form the diaionaries 
with an index as shown in FIG. 6 in order to facilitate the 
55 processes. 

FIG. 6 shows structures of the indexed vocabulary dic- 
tionary 131 and the indexed dictionary 132 of transition 
between words. A major difference fiiom the structure in 
FIG. 5 resides in the fact that an index (pointer) is defined 

60 in order to refer to the information on words in the vocabu- 
lary dictionary 131 and the information on transition 
between words in the dictionary 132 of transition between 
words. A character contained in the individual word is 
defined as an index (pointer) in the vocabulary dictionary 

65 131, and is arranged in the order of the character codes. With 
this configuration, a word having specified characters 
(characters before and after the elliptic symbol) can be found 
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immediately. It can be seen in the example shown in FIG. 6 lary dictionary 131 shown in FIG. 6^ the candidate words 
that words containing a character r^j are and (words containing characters ^X/0j^^j) for interpolating 
rx^ffiaij.Sincethcinformationonanindividualwordinthe elliptic symbols contained in the parts 
vocabulary dictionary 131 has an index (pointer) for refer- '■*^jr^<Djr^-oj can be obtained immediately by refer- 
ring to the information on transition between words that ^ ri°g to the vocabulary dictionary 131. 
contains this word, it is also easy to refer to the information FIG. 9 shows a flowchart describing detail procedures of 
of transition between words in the dictionary 232 of transi- the optimal candidate determination process 152. 
tion between words after referring to the vocabulary dictio- By the process shown in FIG. 8, the candidate words for 
nary 131. In this example, it is found that the word interpolating the elliptic symbols are obtained by extracting 
r T 5 J comes after the word ^X=f^ . words for interpolating the elliptic symbols from the 
As described above, the vocabulary dictionary 131 and vocabulary dictionary. In this example, 
the dictionary 132 of transition between words used in the - X^V&mj^<r> • Z(Dj and are obtained as 
present invention can be formed. candidate words. The number of sentences obtained by 

combining those candidate words is 2 to the power 3, that is. 

Interpolation Process 8» which include 

Next, an mterpolauon process 150 for mte^^^^^ W^^.rt^.cD(J356jr^j^<Dga< j and. r^,Z0^m,y In. 

acters and/or words m a sentence contammg omitted char- ^^^^ ^^^^^^^^ determination process 152, the plau- 

acters and/or words will be descnbed FIG. 7 shows a sibility of those individual sentences is estimated. As for the 

flowchart of the operation of the mtcrpolation process 150. 20 ^^^^ plausibility, the occurrence probability of the 

At first, a sentence including elliptic symbols is input by sentence is estimated. The occurrence probabiUty of seo- 

the user's input operation through the input unit 2304 (step tences composed of a series of words, wl, w2, . . . and wn 

701). Next, the candidate word extraction process 151 (in which wi is the i-th word) is expressed in terms of the 

obtains plural words (or single word) (candidate word) occurrence probability of words and the transition probabil- 

containiog character(s) before and after the elliptic symbol ity between words as in the expression, 

by referring to the vocabulary dictionary 131 (step 72). Occurrence Probability (wl w2 . . . wn)=Oocurrence 

Next, in the optimal candidate determination process 152, Probability (wl)xTransition Probability (w2|wl)x 

the plausibility (defined as occurrence probability and so on) Transition Probability (w3|w2)x . . . x Transition Prob- 

of the sentence constructed by combining plural candidate ability (wn|wn-l). 

words is estimated by referring to the occurrence count Th^ occurrence probability of a word and the transition 

(usage frequency) of the word described in the vocabulary probability between words can be obtained by the occur- 

dictionary 131 and to the transition count (transition rence count described in the vocabulary dictionary 131 ^ 

probability) between words described in the dictionary 132 '^1^''^ f'^T 

c * u J T?- 11 *u * /*u nary 132 oi transition between words, ror example, the 

of transition between words. Finally, the sentence (the ' e j ■ ^ ^ l 

■*» J i_ . J/ J u • 35 occurrence probability of word Wl and the transition prob- 

sentence with omitted characters and/or words being um** i_ *. _j • ^ j j • i. j 

1 . jr J * • J . i_ . t_ i_i / .1- .L. abihty between word wi-1 and word Wl can be expressed as 

mtcrpolated) determmed to be most probable (with the in* 

highest ocairrence probability) is displayed iajhe optimal Occurrence ProbabUity (wi)=Occunencc Count (wi)/ 

candidate determmatton process 152 (step 704). What IS Total Occurrence Counl and 

described above .s a baste scbeme. Among tlwse processes Transition Probability (wn|wn-l)=Transition Count 

the candidate word extraction piocess 151 and the optmial ^ (wn|wn-l)/Occurrence Coilnt (wi-l). 

candidate determmaUon process 152 are descnbed m detail. lo shows an estimation of the occurrence probability 

FIG. 8 shows a flowchart describing detail procedures of character string (sentence) r^j cDPg^j, in which, by 

the candidate word extraction process 151 and its operation. referring to the vocabulary dictionary 131 and the dictio- 

Al first, a part including elliptic symbols in the input ^5 nary 132 of transition between words, the computational 

sentence is searched (step 801). In this step, since the symbol result is obtained as in: 

r-j isdefined as an elliptic symbol, a symbol J between Occurrence Probability cfc^SSSO^ft )«Occurrence 

r*j and r0j, and a symbol r-j after r^j are extracted. Probability <*4«ffl)xTransition Probability 

Next, a set of characters including the elliptic symbol and its ( ^^Sffi -*'<0}x Transition Probability 

before and after characters arc generated (step 802). In this 50 (O->Ba5B)-(0ccurrence Count (^c^rsSBD /Total Occur- 

example, the elliptic symbol r-j between ^Xj and rence Count)x(Transition Count 

rcoj may be interpreted as either a part of the word ^X-^j or ( * t s sa -►i^) >/Oocurrence Count ( Xi'SSM) )x(Transition 

a part of the word r^Oj (and furthemaore, may be inter- Count <®-*^*)/Occurrence Count <<^>))=(8/37582)x(6/ 

preted as a part of the word ^X'^^j. The individual candi- 55 8)x(2/78)-1228xl0"®. 

date words for each of the generated words r^-j and ^ ^'^^"^ }^'' occurrence probability for 

, _ , , ^ . , another character strmg (sentence) is obtamed, and the 

r^(Dj are extracted by referrmg to the vocabulary dictio- sentence having the highest occurrence probability is a result 

nary 131 (step 803). If the stnng ^X^-Oj forms a single of judgment of the optimal interpolated sentence (in this 

word, a word extracted for rx-'j and a word extracted for 50 case, the optimal interpolated sentence is ^MjO^^j). 

r^CQj are identical to each other. In this case, those As described above, the apparatus of the present invention 

candidate words are treated as a single group, and its determines the optimal word in considering the context of 

probability is increased when estimating the probability of the sentence including the elliptic symbols, and the optimal 

words later. This will be described in detail with reference to word (as the first candidate) is displayed on the display unit 

FIG. 14 later. Since the words in the vocabulary dictionary 65 2305 by the display control process 704. It is possible lo 

131 are indexed with individual characters contained in the display the second and third candidate words at the same 

individual word, as described in the structure of the vocabu- time. 
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So far, since the user does not need to select a desired 
word one by one from among from many candidate words, 
and he or she inputs the sentence by insertiag elliptic 
symbols for the parts representing omitted characters and/or 
words in long words, the system can automatically select ao 
optimal result for interpolating an elliptic sentence and 
display the completed sentence. Owing to this system 
operation, the user can continuously input the character 
strings without his or her thought being interrupted (due to 
repetitive requests asking the user to specify the candidate 
words and/or characters), and thus, the user's operability is 
remarkably increased. 

Dictionary Learning 

Next, what is described is an automated learning mecha- 
nism for a dictionary which enables the dictionary to accom- 
modate new words and their definitions and revise existing 
words and their definitions in response to the user's sentence 
input and his or her preference. 

FIG. 11 shows a flowchart of the procedure of the 
dictionary learning process 170 for building a dictionary. 

At first, whether the interpolated sentences obtained by 
the interpolation process 150 is correct or not is judged (step 
1101). Judgment whether the interpolated sentences is cor- 
rect or not is done by using ooe of the following methods or 
combining them. The first method is for a case wherein the 
user inputs a new sentence, and the input of the interpolated 
sentence is judged to be correct. The second method is for 
a case wherein the user's input is not detected for a definite 
period of time, the interpolated sentence is judged to be 
correct. The third method is for a case where the interpolated 
sentence is judged to be correct due to the user's interactive 
verification of the interpolated sentence and his or her input 
of the judgment result that the interpolated sentence pre- 
sented to the user is correct. For example, in the third 
method, in case a display object accepting the user's con- 
firmation judgment is presented to the user on the display, 
and the user directs this display object interactively, the 
interpolated sentence is judged to be correct. If the interpo- 
lated sentence is judged to be correct, learning by the 
dictionary is processed by using this interpolated sentence 
(step 1106). The occurrence count and transition count of the 
word appearing in the completed sentence are measured, and 
the occurrence count and transition count of the word 
defined in the vocabulary dictionary are incremented (in this 
case, the morphological analysis is no longer necessary 
because the delaminated positions for the individual words 
in the sentence are already clarified). 

On the other hand, in case the interpolated sentence is 
judged to contain errors, the user is prompted to indicate, 
using a pen, a keyboard or a mouse, the part of the 
interpolated sentence which he or she wants to modify, and 
then the dictionary learning process 170 will display can- 
didate words in response to the user's operation (step 1105). 
If a desirable word is contained in the displayed candidate 
words, the designated sentence is completed by using the 
word selected by the user from among the candidate words 
(step 1106). In case a desirable word is not contained in the 
candidate words presented by the dictionary leaming pro- 
cess 170, the user is prompted to input a correct word by 
using a pen or a keyboard (step 1104). Then, the interpolated 
sentence with its correction completed, if necessary, is 
learned (step 1106). Itie occurrence count and transition 
count of the word appearing in the completed sentence are 
measured, and the occurrence count and transition count of 
the word defined in the vocabulary dictionary are incre- 
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mented (in this case, the morphological analysis is no longer 
necessary because the delaminated positions for the indi- 
vidual words in the sentence are already clarified). So far, as 
the dictionary is repeatedly trained by leaming new words 
5 and their definitions, a dictionary capturing the user's pref- 
erence can be built up while the user uses the elliptic input 
character interpolation recognition apparatus. 

By referring to FIG. 12 and FIG. 13, the leaming step 
(step 1106) of the vocabulary dictionary 131 and the dic- 
10 tionary 132 of transition between words will be described in 
detail. FIG. 12 ^ows a case wherein the dictionary is 
learned by the dictionary learning process 170, and the step 
1201 is a case where the interpolated sentence is judged to 
be not correct and a correct interpolated sentence is made by 
1^ selecting the displayed candidate words or characters. The 
step 1202 is a case where the interpolated sentence is judged 
to be not correct and a correct interpolated sentence is made 
by the user's input of new characters, and the step 1203 is 
a case were the interpolated sentence is judged to be correct. 
20 Now that a correct interpolated sentence is prepared, new 
words and their definitions for the dictionary are learned in 
the dictionary learning process 170. 

In this embodiment, by referring to FIG. 13, what is 
25 described is a learning step with a sentence c ©Mft j as 
an interpolated sentence. When the sentence 
■"Ntj c <D^^j is input, the individual occurrence counts of 
the words !":jSc^S;5SLi ^ ^ <0j and ^^^S j defined in the vocabu- 
lary dictionary 131 and the total occurrence counts of words 
30 are incremented. The occurrence count after leaming is 
expressed as follows: 
(Occurrence Count after leaming)«(Occurrence Count 

before leaming)+a, and 
(Total Occurrence Count after leaming)=(Total Occur- 
rence Count before leaming)-Kixn, in which n is a 
number of words contained in the sentence to b& 
learned. 

Similarly, the transition counts for the transition between 
^ words r^^gaij-^ii^j and r^o-^rassj defined in the dic- 
tionary 132 of transition between words are incremented. 
The occurrence count before leaming is expressed as fol- 
lows: 

(Transition Count after learning)-(Transition Count 
45 before leaming)+a, in which a is a value of the 
occurrence count to be incremented at a single learning 
step and can be determined so as to satisfy the 
condition, cx>0. The value of a may be taken to be 
larger in order to increase the leaming effect, or the 
50 value of a may be taken to be smaller in order to learn 
slowly. 

There are several methods used for learning user's input 
sentences sequentially. FIG. 14 shows rules for learning 
dictionaries in which the occurrence count and the transition 

55 count are normalized sequentially every time when learning 
occurs. In this method, as the occurrence count and the 
transition count for the whole data are normalized 
sequentially, the range of the occurrence count and the 
transition count does not exceed the allowable memory 

50 range in the memory apparatus even if the number of 
learning incidences increases. The occurrence count and the 
transition count after learning are expressed as in the fol- 
lowing. At first, the occurrence count of the words to be 
teamed (appearing the user's input sentence) is expressed as 

55 follows: 

(Occurrence Count after leaming)=((Occurrence Count 
before leaming)+a)x(S/(S+a)), and 
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(Total Occurrence Count after learniDg)=((Total Occur- 
rence Count before leaming)+a)x(S/(S+a)). 
The occurrence count of the word not to be learned (not 
appearing the user's input sentence) is only nonnalized and, 
thus is expressed as follows: 

(Occurrence Count after norm alization)«(Occurre nee 
Count before normalization)x(S/(S-Kx)). 
Next, the transition count between words to be learned 
(appearing in the user's input sentence) is only normalized 
and, thus is as follows: 

(Transition Count after leaming)-((Traiisition Count 
before leaming)+a)x(T/(T+a)). 
The transition count between words not to be learned (not 
appeared in the user's input sentence) is only normalized 
and thus, is expressed as follows: 

(Transition Count after normalization)=(Transition Count 
before normali2ation)x(T/(r+a)). 
In the above expression, a is a value of the occurrence count 
to be incremented at a single learning step and can be 
determined so as to satisfy the condition, a>0. The value of 
a may be taken to be larger in order to increase the learning 
effect, or the value of a may be taken to be smaller in order 
to learn slowly. In the above expression, by making the value 
of S lower than the maximum number to make it possible to 
memorize the occurrence count and by making the value of 
T lower than the maximum number to make it possible to 
memorize the transition count, the normalization operation 
can be done without exceeding the memory range (in whidi, 
the calculated amount for sequential normalization becomes 
larger). 

Next, a dictionary learning method with normalization in 
which normalization is processed in batch described. The 
rules for this process are shown in FIG. 15. This method is 
such that normalization is applied to the basic learning rules 
shown in FIG. 14 if necessary (when the number of learning 
incidents may exceed the allowable memory numbers). The 
operation based on this method will be is described. At first, 
(1) in case that normalization is not required normally, that 
is, the occurrence count and the transition count do not 
exceed their maximuim allowable memory ranges, the fol- 
lowing expressions are applied in a similar manner to that 
shown in FIG. 13. 

(Occurrence Count after leaming)=(Occurreace Count 

before leaming)+a, and 
(Total Occurrence Count after leaming)-(Total Occur- 
rence Count before leaming)-Hxxn, in which n is a 
number of words contained in the sentence to be 
learned. 

(Transition Count after learning)=(Transition Count 
before leaming)+a. 
In the above expression, a is a value of the occurrence count 
to be incremented at a single learning step and can be 
determined so as to satisfy the condition, a>0. The value of 
a may be taken to be larger in order to increase the learning 
effect, or the value of a may be taken to be smaller in order 
to learn slowly. 

Next, (2) in case that normalization is required, that is, the 
occurrence count and the transition count may exceed their 
maximtmi allowable memory ranges, a value w for normal- 
ization is multiplied to both of the occurrence count and 
transition count of the learned words and the occurrence 
count and transition count of the words not yet learned, and 
then those counts are normalized. Thus, the occurrence 
count of the word to be learned (the word appeared in the 
user*s input sentence) is expressed as follows: 

(Occurrence Count after leaming)s(( Occurrence Count 
before leaming)-Ki)xw, and 
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(Total Occurrence Count after leaming)=((Total Occur- 
rence Count before leaming)+axn)xw. 
The occurrence count of the word not yet learned is 
expressed as follows: 
5 (Transition Count after normalization)»(Transition Coimt 
before normalization)xw. 
The transition count between learned words (transition 
between words appeared in the user's input sentence) is 
expressed as follows: 
10 (Transition Count after leaming)=((Transition Count 
before leaming)+a)xw. 
The transition count between words not yet learned 
(transition between words not appeared in the user's input 
sentence) is expressed as follows: 
(Transition Count after normalization)=(Transition Count 
before normalization)axw, 
in which w is a normalization constant, which is defined so 
that 0<w<l. In this method, as a normalization process is 
executed only if it is required (in case that the occurrence 
20 count and the transition count may exceed their maximum 
allowable memory ranges), the number of normalization 
processes to be executed will be as small as possible. 

According to the above described embodiment, an elliptic 
sentence interpolation and recognition apparatus can be so 
25 configured that the apparatus may have a learning mecha- 
nism with which the dictionaries (the vocabulary dictionary 
131 and the dictionary 132 of transition between words) 
used for interpolating the elliptic symbob can be revised in 
response to the user's preference while the user uses the 
30 dictionaries. 

The basic procedures in the elliptic sentence interpolation 
and recognition apparatus have been described in connection 
with the above embodiment Next, functions for extraction 
of words in special cases, determination of optimal candi- 
35 date words and selection of a candidate word will be 
described. In the following, special cases means that plural 
characters in the candidate words are included in the user's 
input. FIG. 16 is referred to for the foUowiiig description. 
FIG. 16 shows a case of building an interpolated sentence 
40 in response to the user's input r^^c-^xf —xj. 

The candidate word extraction process 151 responds to 
this user's input and generates character strings 
r^-j"" — f -^jT-j^j, each including elUptic symbols, and then 
extracts the candidate words corresponding to those char- 
45 acter strings from the vocabulary dictionary. In this example, 
it is assumed that words ^^>j^^>\ijj are extracted for 
the character string J, words r^r >i7jrY >^ 7 x-;^j are 
extracted for the character string r-^^^j and words 
50 r y — ;^ J > ^ 7 x— X j are extracted for the character string 

The optimal candidate determination process 152 deter- 
mines optima] candidate words based on the extracted 
candidate words. FIG. 17 shows a flowchart of the optimal 
55 candidate determination process 152. When the optimal 
candidate determination process 152 receives the candidate 
words and established characters (r^jr^- j r;^ j) in the elliptic 
sentence from the candidate word extraction process 151, 
the optimal candidate determination process 152 counts the 
niunber of established characters contained in the candidate 
word (step 1701). For example, suppose that the candidate 
word is r^>j^ since the established character f^j is 
included in this word, the number of established characters 
is 1. For all the candidate words, the number of established 

65 

characters for r^>j is 1, that for ""^vaj^jj is I, that for 
^^y^j is 1, that for ^^yi^y x-xj is 2, and that for 
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^^J-^j is 1. The certainty of ihe character string obtained shows an input example in the case of defining the 

by combining the candidate word having the largest number elliptic symbol. In this example, the user defines three 

of established characters is incremented by adding a (step symbols r^j, r- j and r.-.j as an elliptic symbol. (By 

1702). In this example, the candidate word having the , ^ . . . „. . 

...^ . r*i_i-i_jt. * ' r, ^ « 5 dennmg such a specific character stnng as -J as an elliptic 
highcstnumbcrof established characters IS r-fv^ 7 x-;ij, ^ f IiT n- *• ui w ^ • 

aid the character strings obtained by combining this candi- ^^^f the elhptic symbol can be registered even in 

, ^ _ case of iDputtmg all the characters.) 

date word arc r^> -^ >97 »-^J;f x-Xj j^^^ ,^ ^^.^^ ^^^^^^ 

and then, their cerumly is mcremented by ^ding a. Next, recognition process is applied wiU be described, 
the occurrence probability (certamty) of the character strmgs . »u * j . • i u * . • ♦ 
•'^ ,10*^ often happens that an identical character stnng is input 
>i? .V-;^jr.^> y ^7 and repeatedly when inputting sentences and data. By displaying 
y x-Tsj obtained by combining the candidate often-used character strings on the palette, the user can 
words for r^^jp^^^jr^j is estimated by the above easily input sentences only by selecting strings on the 
described method using the occurrence count of words in the palette. FIG. 20 shows an example in which the user inputs 
vocabulary dictionary 131 and the transition count between ^ _ . , .t. . • i. 
words in the dictionary 132 of transition between words, and ^ 10-character-length strmg S^iJteSk?* iJ Sr by using an 
the estimated value for the occurrence probability and the ^P^^ P^^^"^ includmg words used frequently. FIG. 20 
certainty are totally estimated (step. 1703). The character ^^^^^ a case wherein the user can input a character suing 
string having the highest estimated value (the occurrence ^^*jtSS§Ctt 0 frequently, such as one used in the head- 
probability and is taken to be a candidate for the interpolated ing of a business document, only by hitting a button, 
sentence. In this example, the character string In preparing a character string palette including words used 
r^>"f>^7x-;^j is taken to be the first candidate word frequently, character strings required for the system and used 
for the interpolated sentence. frequently by the user are obtained by using the vocabulary 

As described above, the system can present an optimal dictionary 131 and the dictionary 132 of transition between 

word for the interpolated sentence by respecting the user's ^ords described in FIGS. 5 and 6, and the obtained character 

preference and the context of the user's input sentence. gi^ngs having a designated string length (in this example, 10 

FIG. 18 shows a method for displaying candidate words characters) are displayed in order according to the higher 

and prompting the user to select one of them m case the occurrence probability. 

candidate word contains plural characters input by the user f^^^ ^een described is concerns methods of 

In this example, it is assumed that the input part is formed ^ character strings. Next, a description will be pro- 

as a tablet and the user mputs characters in hand-written 30 ^ided for input methods for multimedia information includ- 
manner. It is supposed that an mterpolated sentence ^-^^^ ^^^^ ^^^^ j, ^^^^^ ^^^^ 

r'^>»'f>^7x— is obtained by the process shown in user wants to use and input certain pictures and sounds when 

FIG. 17 in response to the user's input r^^^ ^^j. As creating documents and home page contents. For example, 

described with reference to FIG. 17, the candidate words suppose that he or she wants to iiq>ut pictures and sounds 

r^>i^jr^>4.7 x-;^j for the part r^-f-j, the candidate ^^^^^ '^^P^y exhilarating impression. In this example. 

n,,. ^ r .L. .r j.u what is proposed is a system in which merely by inputting 

words rv-::^jr'f vi^^ x-;^j for the part and the ^ . , v.- • • 

*^ a part of the character strmg representmg the unpression or 

candidate word x-xj for the part r-^'f-^xj arc , . , ^ , , 

, . , , . ,^ , objectof the multimedia source, for example, r^?- J, the user 

obtamed m response to the user s mput r-^^^ j If the ^ ^^tain corresponding information related to the pictures 
user requests display of the candidate words for ^^^^ ^^^^^^ ^^p^^ ^^^^^^^^ ^^^^ PIq ^1 

r<>^7 x-xj in the interpolated sentence, the candidate shows an example of this system, in which an elliptic 

wordsr>f >irjr^>^r7 x-xj for the part r~>f-j. the can- expression r«^j of the adjective r«-¥><^j representing an 

didate words ry-x/^ y x-xj for the part r-^xj and exhilarating feeling is input by the user, and then a matched 

the candidate word ^^ypy for the part picture and musical sound are presented to the user. Thus, 

r^^^xj are displayed (FIG. 18 at (2)). The user is the user is aUowcd to spedfy his or her d«ired items a^ 

prompted to select one of the displayed candidate words. In 1^^° ""P^ flTf"^^ information easily. 

\ ^ ^ . r,,^ .*« . , . The stmcture of the dictionary (dictionary of information 

the example shown m FIG. 18 at (3), the word v^j ^ indicating a relation between vocabulary and multimedia 

selected for the part r^>f-j and the word information) required to realize the above described system 

r^>^7 x-Xj for the part are selected. As shown is shown in FIG. 22. The structure of this dictionary is 

in FIG. 18 at (4), r^> > x-;:^ jis displayed as simUar to the structure of the vocabulary dictionary 131 and 

the result. (Though not shown in the figure, supposing that dictionary 132 of transition between words described in 

the user requests display of the candidate woids for „ FIG. 6. Though the word in the vocabulao^ dictionary 

. , . , , , , , and the U-ansition between words m the dictionary 132 of 

m the mterpolated sentence, the candidate words transition between words are linked to each other in the 

r-f i'^' 7 X— :^ J for the '■->f ~j, the candidate words example shown in FIG. 6, the words (words representing 

rv~;i^jr<>-$^7 x-Xj for the part f-xj and the candi- impressions of images or sounds or defining names of 

, , ^ n * r. , • themselves) are linked to the contents of the multimedia data 

date word r-f >^7 x-Xj for the part r--^^xj are d,s- bases 134 and 135 containing images and sounds instead of 
played as is the case with the candidate word displayed for dictionary 132 of transition between words. 

r-fv^7r-Xj. By using this dictionary, if the user inputs a character string. 

The eUiptic symbol is defined to be r^j in the above fo^ example, r«-j, a worf rsfc^iPj containing a a character 
descnption. It may be possible for the user to customize the 

definition of the eUiptic symbol. In this way, the user can 65 '^-^^^ ^ obtained by referring to the character index in the 

define the elliptic symbol by using a symbol which may be vocabulary dictionary 111. The entry ^^-Ptpj in the vocabu- 

never used in his or her sentences. lary dictionary 111 has pointers to the images and 
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sounds representing an exhilarating feeling or impression, 
and thus, multimedia information which provides exhilarat- 
ing images and sounds can be retrieved immediately by 
referring to the dictionary. 

When building this dictionary, coordinate indexing 
between images and sounds and the words representing their 
impressions may be defined by the user or determined 
automatically by the system making it possible to recognize 
the impression of the individual image. In case of capturing 
the information on the WWW environment, as the words 
and/or phrases near the image contained in the information 
often represent the impression provided by the image, it may 
be possible to associate the image with the word near the 
image. 

As described above, what can be realized is a ^stem in 
which a desired multimedia information including character 
strings, images and sounds can be accessed only by speci- 
fying a part of the character string' without inputting a' 
complete set of a number of characters. 

When the user merely inputs a sentence in a batch manner 
(without recognizing and selecting an individual candidate 
word every time when inputting the individual words in the 
sentence) in which elliptic symbols are inserted for speci- 
fying emitted characters and/or words, the system deter- 
mines and presents optimal words for interpolating the 
elliptic parts. Thus, users can input sentences in a batch 
manner without disturbing the continuity of thought, and the 
operability is remarkably increased. As the dictionaries used 
for interpolating the elliptic characters can be built up 
automatically without the user's interaction and made more 
and more intelligent as the dictionaries are learned, the user 
can operate the system comfortably. 

While the described embodiment represents the preferred 
form of the present invention, it is to be understood that 
changes and variations may be made without departing from 
the spirit of the present invention. 

What is claimed is: 

1. A sentence processing apparatus comprising: 
an input unit for inpuuing characters, 

a di^lay unit for displaying said input characters, and 
a processing unit for converting and editing said input 

characters, 
wherein said processing unit includes: 

candidate word extraction means which extracts can- 
didate words for an elliptic word by referring to a 
vocabulary dictionary storing a word and its usage 
frequency, to a dictionary of transition between 
words defining an information on transition between 
words and a probability of transition between words, 
and by searching the characters before and after the 
elliptic character included in the input sentence in the 
vocabulary dictionary, and 

determination means which selects a single word 
among said extracted candidate words by referring to 
said dictionary of transition between words. 

2. A sentence processing apparatus according to claim 1, 
wherein 

said input unit includes a tablet for allowing an input of 
words by handwriting, and 
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said processing unit includes recognition means for 
extracting and recognizing stroke information input by 
handwriting. 

3. A sentence processing apparatus according to claim 1, 
wherein 

said processing unit includes vocabiilary dictionary build- 
ing means for decomposing an input sentence into 
individual words, and storing an occurrence count of 
said individual word in said sentence and said indi- 
vidual word into said vocabulary dictionary. 

4. A sentence processing apparatus according to claim 1, 
wherein 

said processing unit includes means for building a dic- 
tionary of transition between words for decomposing 
an input sentence into individual words, and storing a 
transition count between said individual words in said 
sentence and said individual word into said dictionary 
of transition between words. 

5. A sentence processing method comprising: 

a step of decomposing an input sentence into individual 
words, and storing an occurrence count of an individual 
word in said sentence and said individual word into a 
vocabulary dictionary, 

a step of storing an transition count between said indi- 
vidual words into a dictionary of transition between 
words and searching a class of a particle for said 
individually decomposed word, 

a step of extracting candidate words of omitted words by 
referring to said vocabulary dictionary on characters 
before and after an elliptic symbol included in said 
input sentence, and 

a step of determining a single word among said candidate 
words extracted on a basis of said dictionary of tran- 
sition between words. 

6. A sentence processing method comprising: 

a step of decomposing an input sentence into individual 
words, and storing an occurrence count of an individual 
word in said sentence and said individual word into a 
vocabulary dictionary, 

a step of storing an transition count between said indi- 
vidual words into a dictionary of transition between 
words and searching a class of a particle for said 
individually decomposed word, 

a step of extracting a candidate of omitted words by 
referring to said vocabulary dictionary on characters 
before and after an elliptic symbol included in said 
input sentence, and 

a step of determining a single word among said candidate 
words extracted on a basis of said dictionary of tran- 
sition between words, wherein 

in a case where said determined word is found in said 
vocabulary dictionary, an occurrence count of said 
determined word is modified and said dictionary of 
transition between words is modified on a basis of an 
information on transition between words. 
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